{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4.5 自动化特征选择\n",
    "<p>在添加新特征或处理一般的高纬度数据集时，最好将特征的数量减少到只包含最有用的特征，并删除其余特征。这样得到范化能力更好,更简单的模型。有三种方法判断特征的作用有多大.分别是:①单变量统计,②基于模型的选择。③迭代选择</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.1.单变量统计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<P>在单变量统计中,我们计算每个特征和目标值之间的关系是否存在统计显著性,然后选择具有最高置信度的特征</P>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_train.shape:\n",
      "(284, 80)\n",
      "X_test.shape:\n",
      "(284, 40)\n"
     ]
    }
   ],
   "source": [
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.feature_selection import SelectPercentile\n",
    "from sklearn.model_selection import train_test_split\n",
    "import numpy as np\n",
    "\n",
    "cancer = load_breast_cancer()\n",
    "# 获得确定性的随机数\n",
    "rng = np.random.RandomState(42)\n",
    "noise = rng.normal(size=(len(cancer.data),50))\n",
    "# 向数据中添加噪声特征\n",
    "# 向前30个特征来自数据集,后50个是噪声\n",
    "X_w_noise = np.hstack([cancer.data,noise])\n",
    "\n",
    "X_train,X_test,y_train,y_test = train_test_split(X_w_noise,cancer.target,random_state=0,test_size=0.5)\n",
    "# 使用f_classif(默认值)和SelectPercentile来选择50%的特征\n",
    "select = SelectPercentile(percentile=50)\n",
    "select.fit(X_train,y_train)\n",
    "# 对训练集进行变换\n",
    "X_train_selected = select.transform(X_train)\n",
    "\n",
    "print(\"X_train.shape:\\n{}\".format(X_train.shape))\n",
    "print(\"X_test.shape:\\n{}\".format(X_train_selected.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ True  True  True  True  True  True  True  True  True False  True False\n",
      "  True  True  True  True  True  True False False  True  True  True  True\n",
      "  True  True  True  True  True  True False False False  True False  True\n",
      " False False  True False False False False  True False False  True False\n",
      " False  True False  True False False False False False False  True False\n",
      "  True False False False False  True False  True False False False False\n",
      "  True  True False  True False False False False]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Text(0.5, 0, 'Sample index')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA6oAAAA4CAYAAAD0OgXLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAL/klEQVR4nO3df4xlZX3H8feHBURAw09bwq+VFkFKdGGXHxaDCGgBqZgGW6k00H82ptsEU0kDlVYhmtAm1Tb1B2wVoUFBoFQJpS0UukWwyu4CdYFFQLL86CLs2iKKzeLCt3+cM93pdJadmTvDvc/u+5VM7jnPOfc8z5zvuffOd57nOTdVhSRJkiRJo2K7YTdAkiRJkqTxTFQlSZIkSSPFRFWSJEmSNFJMVCVJkiRJI8VEVZIkSZI0UkxUJUmSJEkjpalENckpSb6f5LEkFwy7PXp1Sa5I8lySB8aV7ZHktiSP9o+7D7ONmlyS/ZP8S5LVSR5Mcl5fbvxGXJKdktyT5N/72F3clxu7hiSZl+S+JDf368avEUnWJFmV5P4kK/oy49eAJLsluSHJw/3n3zuMXRuSHNK/5sZ+XkjyUePXtmYS1STzgM8DpwKHAWclOWy4rdIWXAmcMqHsAuD2qjoYuL1f1+jZCHysqt4KHAss6V9vxm/0bQBOrKq3AwuAU5Ici7FrzXnA6nHrxq8t766qBVW1qF83fm34S+Afq+pQ4O10r0Fj14Cq+n7/mlsALAR+Bvwdxq9pzSSqwNHAY1X1eFW9BFwLnDHkNulVVNWdwH9OKD4DuKpfvgr4wGvaKE1JVT1TVff2yz+h+7DeF+M38qrz0351h/6nMHbNSLIf8D7gS+OKjV/bjN+IS/JG4HjgywBV9VJVPY+xa9FJwA+q6gmMX9NaSlT3BZ4at/50X6a2/EJVPQNdMgS8acjt0RYkmQ8cAXwX49eEftjo/cBzwG1VZeza8hfAHwKvjCszfu0o4NYkK5Ms7suM3+g7CFgHfKUfdv+lJLtg7Fr0IeCaftn4NaylRDWTlNVr3gppG5JkV+BvgY9W1QvDbo+mpqpe7oc/7QccneTwYbdJU5PkdOC5qlo57LZoxo6rqiPppiotSXL8sBukKdkeOBL4YlUdAbyIw0Sbk2RH4P3A9cNuiwbXUqL6NLD/uPX9gLVDaotm7tkk+wD0j88NuT3ajCQ70CWpX62qG/ti49eQftjaMrq54sauDccB70+yhm6Ky4lJrsb4NaOq1vaPz9HNkTsa49eCp4Gn+xEoADfQJa7Gri2nAvdW1bP9uvFrWEuJ6nLg4CRv7v9b8iHgpiG3SdN3E3BOv3wO8M0htkWbkSR083RWV9Vnxm0yfiMuyd5JduuXXw+cDDyMsWtCVV1YVftV1Xy6z7k7qupsjF8TkuyS5A1jy8B7gQcwfiOvqn4IPJXkkL7oJOAhjF1rzmLTsF8wfk1LVTujZ5OcRjd3Zx5wRVV9eshN0qtIcg1wArAX8CzwCeAbwHXAAcCTwAerauINlzRkSd4JfAtYxaZ5cn9EN0/V+I2wJG+ju2HEPLp/Rl5XVZck2RNj15QkJwDnV9Xpxq8NSQ6i60WFbijp16rq08avDUkW0N3EbEfgceB36d9HMXYjL8nOdPezOaiqftyX+dprWFOJqiRJkiRp69fS0F9JkiRJ0jbARFWSJEmSNFJMVCVJkiRJI8VEVZIkSZI0UkxUJUmSJEkjZaBENckeSW5L8mj/uPur7DsvyX1Jbh6wzsWDPF/DZfzaZezaZvzaZvzaZezaZvzaZezaN2iP6gXA7VV1MHB7v7455wGrB6wPwIuubcavXcaubcavbcavXcaubcavXcaucYMmqmfQfbE8/eMHJtspyX7A++i+RFmSJEmSpM1KVc38ycmPgXuA+cAa4Kiq2m3CPvsDK4EX+6IXq+rwKR5/5o3TrFu4cOG09l+5cuWcHHs6x93aTTcmU+U5bttk18W6devYe++9X7M2TPcamqv3gFF4jcxGG0Y5fqNwjkelHdN57Y3COW7R1v7eMpdG4e+y1/K8zdb75lz9TbQ1nOPZsGbNGtavX5/Jtm0xUU3yz8AvTrLp48DXgU9W1aVJLuiXd5rw/N8Bfr2qfjPJqcD1wNFV9dBm6lvMpq76ts70Vm66/9RIJr3mBj72dI67tRvkH02vxnPctrm6LqZjutfQXL0HjMJrZBTiMV2j8PvN5TU0V+1o7Tpu0db+3jKXvJZnZq7+JvIcdxYtWsSKFSsm/QW339KTq+rkzW1Lsh3wD0lOAT4CvC7JBVV16bjdDgN+NckLwM7APOArwDGbqW8psLQ/fltnWpIkSZI0sEHnqL4CnAZ8HrgO2ACcleSwsR2q6kK6HtK7gbP7feZt7oBJFidZkWTFgG2TJEmSJDVoiz2qWxj6uwH4DWBfYAmwA/AMcHaSBVV1Wr/vx4C3AFcCG4G9kuxTVc9MPKg9qpIkSZK0bRt06O8P6ZLKPwfeA/wN3VDfH41LUgF2Ap4FPkv3FTXX0iW3/y9RnTBHVZIkSZK0jdlioroFNwFn0vWs3twf7yngzWM7pJv9eyDwbeBbwHeAl4FJe0vtUZUkSZKkbdugc1QvBX4J2At4BDgUOAB4Y5Jb+n2Oo+s9PQa4A/g53dzWtZMd0DmqkiRJkrRtGyhRraof0c07fZnuRkn/1W/aODb0t6ruAk6n+x7VL9Mlqmsnm5/a77+0qhZV1aJB2iZJkiRJatOgQ3+h6x19Hvgnurv5Pgn8d5KPAFTVZcD9dL2uS4CfAZfNQr2SJEmSpK3QbCSqPwB2BH4N+A/gMWBVn6COuZxuHuu76G6U9PjmDubNlCRJkiRp25aqwe5XlOQdwOeAN9D1qD4GLKMfBlxVlyX5CbAz3RDheXS9sB+sqm9s4djrgCcmFO8FrB+o0Rom49cuY9c249c249cuY9c249cuY9eGA6tq78k2zEaiuj3djZROoutRXQ78dlU9uJn9rwRurqobZljfCuevtsv4tcvYtc34tc34tcvYtc34tcvYtW/gob9VtTHJ77NpjuoVVfXghDmqkiRJkiRNyWzMUaWqbgFumVA2aYJaVefORp2SJEmSpK3ToN+jOgxLh90ADcT4tcvYtc34bUGSjyd5MMn3ktyf5Jg5rm9ZkqkOS1ua5JIkJ0+zjjVJ9ppB8zR7fO21zfi1y9g1buA5qpIkta6/MeBngBOqakOf3O1YVWvnsM5lwPlVtWIO61gDLKoqbygiSWpKiz2qkiTNtn2A9VW1AaCq1o8lqUn+JMnyJA8kWZokffmyJJ9NcmeS1UmOSnJjkkeTfKrfZ36Sh5Nc1ffU3pBk54mVJ3lvkn9Lcm+S65PsOsk+VyY5s19ek+Tifv9VSQ7ty/dMcmuS+5JcDmTc889Ock/fW3x5knl9m7+XZKcku/Q9yofP/umVJGl6TFQlSYJbgf2TPJLkC0neNW7b56rqqKo6HHg9cPq4bS9V1fHAZcA3gSXA4cC5Sfbs9zkEWFpVbwNeAH5vfMV97+1FwMlVdSSwAviDKbR5fb//F4Hz+7JPAHdV1RHATcABfR1vBX4LOK6qFtB9XdyHq2p5v9+ngD8Drq6qB6ZQtyRJc8pEVZK0zauqnwILgcXAOuDrSc7tN787yXeTrAJOBH5l3FNv6h9XAQ9W1TN9r+zjwP79tqeq6u5++WrgnROqPxY4DLg7yf3AOcCBU2j2jf3jSmB+v3x8XwdV9ff032lO9xVyC4HlfR0nAQf12y4B3gMsoktWJUkaulm5668kSa2rqpeBZcCyPik9J8m1wBfo5nk+leSTwE7jnrahf3xl3PLY+thn7MSbQUxcD3BbVZ01zSaP1fcy//fzfLKbTwS4qqounGTbHsCuwA50v9uL02yHJEmzzh5VSdI2L8khSQ4eV7QAeIJNSen6ft7omTM4/AH9zZoAzgLumrD9O8BxSX65b8vOSd4yg3oA7gQ+3B/nVGD3vvx24Mwkb+q37ZFkrNd2KfDHwFeBP51hvZIkzSp7VCVJ6noU/yrJbsBG4DFgcVU9n+Sv6Yb2rgGWz+DYq+l6Zy8HHqWbU/q/qmpdP8z4miSv64svAh6ZQV0X98e5F/hX4Mm+joeSXATcmmQ74OfAkn4u7saq+lqSecC3k5xYVXfMoG5JkmaNX08jSdIcSTIfuLm/EZMkSZoih/5KkiRJkkaKPaqSJEmSpJFij6okSZIkaaSYqEqSJEmSRoqJqiRJkiRppJioSpIkSZJGiomqJEmSJGmkmKhKkiRJkkbK/wDvDXNz4+oI/wAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 1152x144 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "mask = select.get_support() # 获取特征信息\n",
    "print(mask)\n",
    "# 将遮罩可视化黑为True，白为false\n",
    "plt.matshow(mask.reshape(1,-1),cmap='gray_r')\n",
    "plt.xlabel(\"Sample index\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score with all features:0.919\n",
      "Score with only selected features0.923\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E:\\myAnaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n",
      "E:\\myAnaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n"
     ]
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "# 对测试集进行变换\n",
    "X_test_selected = select.transform(X_test)\n",
    "\n",
    "lr = LogisticRegression()\n",
    "lr.fit(X_train,y_train)\n",
    "print(\"Score with all features:{:.3f}\".format(lr.score(X_test,y_test)))\n",
    "lr.fit(X_train_selected,y_train)\n",
    "print(\"Score with only selected features{:.3f}\".format(lr.score(X_test_selected,y_test)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.2 基于模型的特征选择\n",
    "<p>基于模型的特征选择使用一个监督机器学习模型来判断每个特征的重要性,并且仅保留最重要的特征。用于特征选择的监督模型不需要与用于最终监督建模的模型相同。特征选择模型需要为每个特征提供某种重要性度量,以便使用这个度量对特征进行排序。据册数和基于决策树的模型提供了feature_importances_属性，线性模型的系数绝对值也可以用于表示特征的重要性</p>\n",
    "<P>要想使用基于模型的特征选择,我们需要使用SelectFromModel变换器</P>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_train.shape:(284, 80)\n",
      "X_train_l1.shape:(284, 40)\n"
     ]
    }
   ],
   "source": [
    "from sklearn.feature_selection import SelectFromModel\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "select = SelectFromModel(\n",
    "    RandomForestClassifier(n_estimators=100,random_state=42),\n",
    "    threshold='median' # 以中位数作为阈值,此时选出一半特征\n",
    ")\n",
    "select.fit(X_train,y_train) # 构建特征选择器\n",
    "X_train_l1 = select.transform(X_train)\n",
    "print(f\"X_train.shape:{X_train.shape}\")\n",
    "print(f\"X_train_l1.shape:{X_train_l1.shape}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 0, 'Sample index')"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA6oAAAA4CAYAAAD0OgXLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAL4klEQVR4nO3dfaxlVXnH8e+PAURAw6st4W2kRZASHZjhxWIQAS0gFdNgK5UG+s/EdJpgKmmg0ipEE9qk2qa+wKgIDQoCpUoobaHQKYJVZgaoAwwCkuGlIMzYIorN4MDTP/aacnu9l7n3nns9Z8/9fpLJ2XvtdfZac559zslz11r7pKqQJEmSJGlUbDPsDkiSJEmSNJaJqiRJkiRppJioSpIkSZJGiomqJEmSJGmkmKhKkiRJkkaKiaokSZIkaaT0KlFNclKS7yV5JMl5w+6PXl2Sy5I8m+S+MWW7JbklycPtcddh9lETS7Jvkn9NsjbJ/UnOaeXGb8Ql2SHJXUn+o8XuwlZu7HokyYIk9yS5se0bv55Isi7JmiT3JlnVyoxfDyTZJcl1SR5s339vM3b9kOSg9p7b/O/5JB82fv3Wm0Q1yQLgs8DJwCHAGUkOGW6vtAWXAyeNKzsPuLWqDgRubfsaPZuAj1TVm4GjgWXt/Wb8Rt9G4PiqeiuwCDgpydEYu745B1g7Zt/49cs7q2pRVS1p+8avH/4a+KeqOhh4K9170Nj1QFV9r73nFgGLgZ8Cf4/x67XeJKrAkcAjVfVoVb0IXA2cNuQ+6VVU1e3Af40rPg24om1fAbzvF9opTUlVPV1Vd7ftH9N9We+N8Rt51flJ292u/SuMXW8k2Qd4D/DFMcXGr9+M34hL8nrgWOBLAFX1YlU9h7HroxOA71fVYxi/XutToro38MSY/Sdbmfrll6rqaeiSIeANQ+6PtiDJQuAw4DsYv15o00bvBZ4FbqkqY9cvfwX8MfDymDLj1x8F3JxkdZKlrcz4jb4DgPXAl9u0+y8m2Qlj10cfAK5q28avx/qUqGaCsvqF90KaR5LsDPwd8OGqen7Y/dHUVNVLbfrTPsCRSQ4ddp80NUlOBZ6tqtXD7otm7JiqOpxuqdKyJMcOu0Oakm2Bw4HPV9VhwAs4TbR3kmwPvBe4dth90eD6lKg+Cew7Zn8f4Kkh9UUz90ySvQDa47ND7o8mkWQ7uiT1K1V1fSs2fj3Spq2toFsrbuz64RjgvUnW0S1xOT7JlRi/3qiqp9rjs3Rr5I7E+PXBk8CTbQYKwHV0iaux65eTgbur6pm2b/x6rE+J6krgwCRvbH8t+QBww5D7pOm7ATirbZ8FfGOIfdEkkoRunc7aqvrUmEPGb8Ql2TPJLm37tcCJwIMYu16oqvOrap+qWkj3PXdbVZ2J8euFJDsled3mbeDdwH0Yv5FXVT8AnkhyUCs6AXgAY9c3Z/DKtF8wfr2Wqv7Mnk1yCt3anQXAZVX1ySF3Sa8iyVXAccAewDPAx4CvA9cA+wGPA++vqvE3XNKQJXk78E1gDa+sk/sTunWqxm+EJXkL3Q0jFtD9MfKaqrooye4Yu15JchxwblWdavz6IckBdKOo0E0l/WpVfdL49UOSRXQ3MdseeBT4fdrnKMZu5CXZke5+NgdU1Y9ame+9HutVoipJkiRJ2vr1aeqvJEmSJGkeMFGVJEmSJI0UE1VJkiRJ0kgxUZUkSZIkjRQTVUmSJEnSSBkoUU2yW5JbkjzcHnd9lboLktyT5MYB21w6yPM1XMavv4xdvxm/fjN+/WXs+s349Zex679BR1TPA26tqgOBW9v+ZM4B1g7YHoAXXb8Zv/4ydv1m/PrN+PWXses349dfxq7nBk1UT6P7YXna4/smqpRkH+A9dD+iLEmSJEnSpFJVM39y8iPgLmAhsA44oqp2GVdnX2A18EIreqGqDp3i+WfeOfXK4sWLp1x39erVW20fpNkw0bW8fv169txzz58rn861PJ33yHSNSj+GbbqfLXP1WsxVPObys3MUrotRuY5H4XtyazHqn51bs0Ff48liN5e8LqZv3bp1bNiwIRMd22KimuRfgF+e4NBHga8BH6+qi5Oc17Z3GPf83wN+s6p+O8nJwLXAkVX1wCTtLeWVoXojOE9M5w8myYTX8lbRB2k2zNW1PMgfNvvSj2Gb7mfLXL0WcxWPufzsHIXrYlSu41H4ntzajUqst2Z9fI372OdhW7JkCatWrZrwhdt2S0+uqhMnO5ZkG+Afk5wEfAh4TZLzquriMdUOAX49yfPAjsAC4MvAUZO0txxY3s5vBCVJkiRpnhl0jerLwCnAZ4FrgI3AGUkO2Vyhqs6nGyG9Eziz1Vkw2QmTLE2yKsmqAfsmSZIkSeqhLY6obmHq70bgt4C9gWXAdsDTwJlJFlXVKa3uR4A3AZcDm4A9kuxVVU+PP6kjqpIkSZI0vw069fcHdEnlXwLvAv6WbqrvD8ckqQA7AM8An6b7iZqr6ZLbn0tUx61RlSRJkiTNM1tMVLfgBuB0upHVG9v5ngDeuLlCulXF+wPfAr4JfBt4CZhwtNQRVUmSJEma3wZdo3ox8CvAHsBDwMHAfsDrk9zU6hxDN3p6FHAb8DO6ta1PTXRC16hKkiRJ0vw2UKJaVT+kW3f6Et2Nkv67Hdq0eepvVd0BnEr3O6pfoktUn5pofWqrv7yqllTVkkH6JkmSJEnqp0Gn/kI3Ovoc8M90d/N9HPifJB8CqKpLgHvpRl2XAT8FLpmFdiVJkiRJW6HZSFS/D2wP/Abwn8AjwJqWoG52Kd061nfQ3Sjp0clO5s2UJEmSJGl+S9Vg9ytK8jbgM8Dr6EZUHwFW0KYBV9UlSX4M7Eg3RXgB3Sjs+6vq61s493rgsXHFewAbBuq0hsn49Zex6zfj12/Gr7+MXb8Zv/4ydv2wf1XtOdGB2UhUt6W7kdIJdCOqK4Hfrar7J6l/OXBjVV03w/ZWuX61v4xffxm7fjN+/Wb8+svY9Zvx6y9j138DT/2tqk1J/pBX1qheVlX3j1ujKkmSJEnSlMzGGlWq6ibgpnFlEyaoVXX2bLQpSZIkSdo6Dfo7qsOwfNgd0ECMX38Zu34zfluQ5KNJ7k/y3ST3JjlqjttbkWSq09KWJ7koyYnTbGNdkj1m0D3NHt97/Wb8+svY9dzAa1QlSeq7dmPATwHHVdXGltxtX1VPzWGbK4Bzq2rVHLaxDlhSVd5QRJLUK30cUZUkabbtBWyoqo0AVbVhc5Ka5M+SrExyX5LlSdLKVyT5dJLbk6xNckSS65M8nOQTrc7CJA8muaKN1F6XZMfxjSd5d5J/T3J3kmuT7DxBncuTnN621yW5sNVfk+TgVr57kpuT3JPkUiBjnn9mkrvaaPGlSRa0Pn83yQ5JdmojyofO/ssrSdL0mKhKkgQ3A/smeSjJ55K8Y8yxz1TVEVV1KPBa4NQxx16sqmOBS4BvAMuAQ4Gzk+ze6hwELK+qtwDPA38wtuE2ensBcGJVHQ6sAv5oCn3e0Op/Hji3lX0MuKOqDgNuAPZrbbwZ+B3gmKpaRPdzcR+sqpWt3ieAvwCurKr7ptC2JElzykRVkjTvVdVPgMXAUmA98LUkZ7fD70zynSRrgOOBXxvz1Bva4xrg/qp6uo3KPgrs2449UVV3tu0rgbePa/5o4BDgziT3AmcB+0+h29e3x9XAwrZ9bGuDqvoH2m+a0/2E3GJgZWvjBOCAduwi4F3AErpkVZKkoZuVu/5KktR3VfUSsAJY0ZLSs5JcDXyObp3nE0k+Duww5mkb2+PLY7Y372/+jh1/M4jx+wFuqaozptnlze29xP//Pp/o5hMBrqiq8yc4thuwM7Ad3f/thWn2Q5KkWeeIqiRp3ktyUJIDxxQtAh7jlaR0Q1s3evoMTr9fu1kTwBnAHeOOfxs4Jsmvtr7smORNM2gH4Hbgg+08JwO7tvJbgdOTvKEd2y3J5lHb5cCfAl8B/nyG7UqSNKscUZUkqRtR/JskuwCbgEeApVX1XJIv0E3tXQesnMG519KNzl4KPEy3pvT/VNX6Ns34qiSvacUXAA/NoK0L23nuBv4NeLy18UCSC4Cbk2wD/AxY1tbibqqqryZZAHwryfFVddsM2pYkadb48zSSJM2RJAuBG9uNmCRJ0hQ59VeSJEmSNFIcUZUkSZIkjRRHVCVJkiRJI8VEVZIkSZI0UkxUJUmSJEkjxURVkiRJkjRSTFQlSZIkSSPFRFWSJEmSNFL+F/isWHNYcxLhAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 1152x144 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "mask = select.get_support()\n",
    "# 将遮罩可视化-- 黑色为True,白色为False\n",
    "plt.matshow(mask.reshape(1,-1),cmap='gray_r')\n",
    "plt.xlabel(\"Sample index\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test Score:0.930\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E:\\myAnaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n"
     ]
    }
   ],
   "source": [
    "X_test_l1 = select.transform(X_test)\n",
    "score=LogisticRegression().fit(X_train_l1,y_train).score(X_test_l1,y_test)\n",
    "print(\"Test Score:{:.3f}\".format(score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.3 迭代特征选择\n",
    "<p>以迭代特征消除法(RFE)为例,他从所有特征开始构建模型,并根据模型舍弃最不重要的特征，然后使用被舍弃特征之外的所有特征开始构建模型，如此继续，直到仅剩预设特征数量的特征。为了运行该方法,我们需要用于选择的模型提供某种用于度量的方法</p>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 0, 'Sample index')"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA6oAAAA4CAYAAAD0OgXLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAL30lEQVR4nO3df4xlZX3H8feHBURAw09bwq+VFkFKdGGXHxaDCGgBqZgGW6k00H82ptsEU0kDlVYhmtAm1Tb1B6yK0KAgUFSCtIVCtwhW2V2gLrAISJYfZYVdW0SxWVz49o/zTHc6zrCzc2e497DvVzK55zzn3PM8c7/nzM13nuc5J1WFJEmSJEmjYpthN0CSJEmSpPFMVCVJkiRJI8VEVZIkSZI0UkxUJUmSJEkjxURVkiRJkjRSTFQlSZIkSSOlV4lqkpOS/CDJI0nOG3Z79PKSXJbkmST3jSvbLcktSR5ur7sOs42aXJJ9k/xrktVJ7k9yTis3fiMuyQ5J7kryHy12F7ZyY9cjSeYluSfJjW3d+PVEkjVJViW5N8mKVmb8eiDJLkmuS/Jg+/57m7HrhyQHtWtu7Oe5JB82fv3Wm0Q1yTzgs8DJwCHAGUkOGW6rtBmXAydNKDsPuLWqDgRubesaPRuBj1TVm4GjgSXtejN+o28DcHxVvRVYAJyU5GiMXd+cA6wet278+uWdVbWgqha1dePXD38L/FNVHQy8le4aNHY9UFU/aNfcAmAh8HPg6xi/XutNogocCTxSVY9W1QvA1cBpQ26TXkZV3Q7814Ti04Ar2vIVwPte0UZpWqpqbVXd3ZZ/SvdlvTfGb+RV52dtdbv2Uxi73kiyD/Ae4Ivjio1fvxm/EZfk9cCxwJcAquqFqnoWY9dHJwA/rKrHMH691qdEdW/giXHrT7Yy9cuvVNVa6JIh4A1Dbo82I8l84DDgexi/XmjDRu8FngFuqSpj1y9/A/wp8NK4MuPXHwXcnGRlksWtzPiNvgOAdcCX27D7LybZCWPXRx8ArmrLxq/H+pSoZpKyesVbIW1FkuwM/APw4ap6btjt0fRU1Ytt+NM+wJFJDh12mzQ9SU4FnqmqlcNui2bsmKo6nG6q0pIkxw67QZqWbYHDgc9X1WHA8zhMtHeSbA+8F7h22G3R4PqUqD4J7DtufR/gqSG1RTP3dJK9ANrrM0Nuj6aQZDu6JPUrVXV9KzZ+PdKGrS2jmytu7PrhGOC9SdbQTXE5PsmVGL/eqKqn2uszdHPkjsT49cGTwJNtBArAdXSJq7Hrl5OBu6vq6bZu/HqsT4nqcuDAJG9s/y35AHDDkNukLXcDcFZbPgv45hDboikkCd08ndVV9alxm4zfiEuyZ5Jd2vJrgROBBzF2vVBV51fVPlU1n+577raqOhPj1wtJdkryurFl4N3AfRi/kVdVPwKeSHJQKzoBeABj1zdnsGnYLxi/XktVf0bPJjmFbu7OPOCyqvrkkJukl5HkKuA4YA/gaeBjwDeAa4D9gMeB91fVxBsuaciSvB34NrCKTfPk/oxunqrxG2FJ3kJ3w4h5dP+MvKaqLkqyO8auV5IcB5xbVacav35IcgBdLyp0Q0m/WlWfNH79kGQB3U3MtgceBf6Q9ncUYzfykuxIdz+bA6rqJ63Ma6/HepWoSpIkSZJe/fo09FeSJEmStBUwUZUkSZIkjRQTVUmSJEnSSDFRlSRJkiSNFBNVSZIkSdJIGShRTbJbkluSPNxed32ZfecluSfJjQPWuXiQ92u4jF9/Gbt+M379Zvz6y9j1m/HrL2PXf4P2qJ4H3FpVBwK3tvWpnAOsHrA+AE+6fjN+/WXs+s349Zvx6y9j12/Gr7+MXc8NmqieRvdgedrr+ybbKck+wHvoHqIsSZIkSdKUUlUzf3PyE+AuYD6wBjiiqnaZsM++wErg+Vb0fFUdOs3jz7xx0ghZuHDhtPdduXLlHLZEW4PJzrd169ax5557/lL5lpxvW3Iea3ZNFT9tMld/O+fq7/dcXk+j8FnMlS393QZt86vp2pur83NUzvuJhhG7Uf0spvJKX0+TWbNmDevXr89k2zabqCb5F+BXJ9n0UeBrwMer6uIk57XlHSa8/w+A366q301yMnAtcGRVPTBFfYvZ1FU//AhKs2BL/iGUTHqtStM2V+fbIP/YlObaXP3t7OP1NAqfxVzZ0t9tFNo8Kubq/ByV834U9O2zGIXradGiRaxYsWLShmw7jQadONW2JNsA/5jkJOBDwGuSnFdVF4/b7RDgN5M8B+wIzAO+DBw1RX1LgaXt+MOPoCRJkiTpFTXoHNWXgFOAzwLXABuAM5IcMrZDVZ1P10N6J3Bm22feVAdMsjjJiiQrBmybJEmSJKmHNtujupmhvxuA3wH2BpYA2wFrgTOTLKiqU9q+HwHeBFwObAT2SLJXVa2deFB7VCVJkiRp6zbo0N8f0SWVfw28C/h7uqG+Px6XpALsADwNfJruETVX0yW3v5SoTpijKkmSJEnaymw2Ud2MG4DT6XpWb2zHewJ449gO6Wbp7g98B/g28F3gRWDS3lJ7VCVJkiRp6zboHNWLgV8D9gAeAg4G9gNen+Smts8xdL2nRwG3Ab+gm9v61GQHdI6qJEmSJG3dBkpUq+rHdPNOX6S7UdJ/t00bx4b+VtUdwKl0z1H9El2i+tRk81Pb/kuralFVLRqkbZIkSZKkfhp06C90vaPPAv9Mdzffx4H/SfIhgKq6BLiXrtd1CfBz4JJZqFeSJEmS9Co0G4nqD4Htgd8C/hN4BFjVEtQxl9LNY30H3Y2SHp3qYN5MSZIkSZK2bqka7H5FSd4GfAZ4HV2P6iPAMtow4Kq6JMlPgR3phgjPo+uFfX9VfWMzx14HPDaheA9g/UCN1jAZv/4ydv1m/PrN+PWXses349dfxq4f9q+qPSfbMBuJ6rZ0N1I6ga5HdTnw+1V1/xT7Xw7cWFXXzbC+Fc5f7S/j11/Grt+MX78Zv/4ydv1m/PrL2PXfwEN/q2pjkj9m0xzVy6rq/glzVCVJkiRJmpbZmKNKVd0E3DShbNIEtarOno06JUmSJEmvToM+R3UYlg67ARqI8esvY9dvxm8zknw0yf1Jvp/k3iRHzXF9y5JMd1ja0iQXJTlxC+tYk2SPGTRPs8drr9+MX38Zu54beI6qJEl9124M+CnguKra0JK77avqqTmscxlwblWtmMM61gCLqsobikiSeqWPPaqSJM22vYD1VbUBoKrWjyWpSf4iyfIk9yVZmiStfFmSTye5PcnqJEckuT7Jw0k+0faZn+TBJFe0ntrrkuw4sfIk707y70nuTnJtkp0n2efyJKe35TVJLmz7r0pycCvfPcnNSe5JcimQce8/M8ldrbf40iTzWpu/n2SHJDu1HuVDZ//jlSRpy5ioSpIENwP7JnkoyeeSvGPcts9U1RFVdSjwWuDUcdteqKpjgUuAbwJLgEOBs5Ps3vY5CFhaVW8BngP+aHzFrff2AuDEqjocWAH8yTTavL7t/3ng3Fb2MeCOqjoMuAHYr9XxZuD3gGOqagHd4+I+WFXL236fAP4KuLKq7ptG3ZIkzSkTVUnSVq+qfgYsBBYD64CvJTm7bX5nku8lWQUcD/zGuLfe0F5XAfdX1drWK/sosG/b9kRV3dmWrwTePqH6o4FDgDuT3AucBew/jWZf315XAvPb8rGtDqrqW7RnmtM9Qm4hsLzVcQJwQNt2EfAuYBFdsipJ0tDNyl1/JUnqu6p6EVgGLGtJ6VlJrgY+RzfP84kkHwd2GPe2De31pXHLY+tj37ETbwYxcT3ALVV1xhY2eay+F/n/3+eT3XwiwBVVdf4k23YDdga2o/vdnt/CdkiSNOvsUZUkbfWSHJTkwHFFC4DH2JSUrm/zRk+fweH3azdrAjgDuGPC9u8CxyT59daWHZO8aQb1ANwOfLAd52Rg11Z+K3B6kje0bbslGeu1XQr8OfAV4C9nWK8kSbPKHlVJkroexb9LsguwEXgEWFxVzyb5At3Q3jXA8hkcezVd7+ylwMN0c0r/T1Wta8OMr0rymlZ8AfDQDOq6sB3nbuDfgMdbHQ8kuQC4Ock2wC+AJW0u7saq+mqSecB3khxfVbfNoG5JkmaNj6eRJGmOJJkP3NhuxCRJkqbJob+SJEmSpJFij6okSZIkaaTYoypJkiRJGikmqpIkSZKkkWKiKkmSJEkaKSaqkiRJkqSRYqIqSZIkSRopJqqSJEmSpJHyv4/DT3MMGNrCAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 1152x144 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.feature_selection import RFE\n",
    "select = RFE(RandomForestClassifier(n_estimators=100,random_state=42),n_features_to_select=40)\n",
    "select.fit(X_train,y_train)\n",
    "# 将选中的特征可视化：\n",
    "mask = select.get_support()\n",
    "plt.matshow(mask.reshape(1,-1),cmap='gray_r')\n",
    "plt.xlabel(\"Sample index\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test Score:0.930\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E:\\myAnaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n"
     ]
    }
   ],
   "source": [
    "X_train_rfe = select.transform(X_train)\n",
    "X_test_rfe = select.transform(X_test)\n",
    "\n",
    "score = LogisticRegression().fit(X_train_rfe,y_train).score(X_test_rfe,y_test)\n",
    "print(\"Test Score:{:.3f}\".format(score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 总结\n",
    "  <p>如果你不确定选择使用那些特征作为机器学习算法的输入,那么自动化特征选择可能特别有用,他有助于减少所需特征数量，加快预测速度，或允许构建可解释性更强的模型</p>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
